Tidy Data

“Happy families are all alike; every unhappy family is unhappy in its own way.” –– Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham

Vocabulary

Variable

Cases

What is Tidy Data

There are three interrelated rules which make a dataset tidy:

  1. Each variable must have its own column.
  2. Each observation/case must have its own row.
  3. Each value must have its own cell.

It is your job as the researcher to define the variables, observations, and values.

Example of Untidy data

Example of Tidy Data

Tidy Data Example

From https://r4ds.had.co.nz/tidy-data.html

You can represent the same underlying data in multiple ways. The example below shows the same data organised in four different ways. Each dataset shows the same values of four variables country, year, population, and cases, but each dataset organises the values in a different way.

Which ones of these is tidy?

Option 1

library(tidyverse)
table1

Option 2

table2

Option 3

table3

Option 4

table4a
table4b

Example Continuted

Table 1!

Galton Data

In the 1880s, Francis Galton started to make a mathematical theory of evolution.

Here’s part of a page from his lab notebook. Discuss the following in groups:

A page from Francis Galton’s notebook.

Activity 01: Tidy Data

Work to put these tables in tidy form

Table 1: Galton’s Height measurements data

A page from Francis Galton’s notebook.

Table 2: Presidents

Code Books

What is a code book?

References

LS0tCnRpdGxlOiAiTDAyIC0gVGlkeSBEYXRhIgphdXRob3I6IAotICJQcmVzZW50ZXI6IE9saXZpYSBCZWNrIiAKLSAiQ29udGVudCBDcmVkaXQ6IE1hdHRoZXcgQmVja21hbiwgSGFkbGV5IFdpY2toYW0iCmRhdGU6ICJNYXkgMTcsIDIwMjMiCgpvdXRwdXQ6IAogIHNsaWR5X3ByZXNlbnRhdGlvbjogZGVmYXVsdAogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKCi0tLQoKCgoKIyMgVGlkeSBEYXRhCgrigJxIYXBweSBmYW1pbGllcyBhcmUgYWxsIGFsaWtlOyBldmVyeSB1bmhhcHB5IGZhbWlseSBpcyB1bmhhcHB5IGluIGl0cyBvd24gd2F5LuKAnSDigJPigJMgTGVvIFRvbHN0b3kKCuKAnFRpZHkgZGF0YXNldHMgYXJlIGFsbCBhbGlrZSwgYnV0IGV2ZXJ5IG1lc3N5IGRhdGFzZXQgaXMgbWVzc3kgaW4gaXRzIG93biB3YXku4oCdIOKAk+KAkyBIYWRsZXkgV2lja2hhbQoKCi0gS2V5IGlkZWFzOgogIC0gQ2FzZXMgPSBSb3dzCiAgLSBWYXJpYWJsZXMgPSBDb2x1bW5zIAotIEhvdyBzaG91bGQgd2UgZGVmaW5lICoqY2FzZSoqPwotIEhvdyBkbyB3ZSBpZGVudGlmeSAqKnZhcmlhYmxlcyoqPwotIEFkdmFudGFnZXMgYW5kIERpc2FkdmFudGFnZXMgCgojIyBWb2NhYnVsYXJ5IAoKKipWYXJpYWJsZSoqIAoKLSBJbiBkYXRhIHNjaWVuY2UsIHRoZSB3b3JkIHZhcmlhYmxlIGhhcyBhIGRpZmZlcmVudCBtZWFuaW5nIHRoYW4gaW4gbWF0aGVtYXRpY3MuIAogIC0gSW4gYWxnZWJyYSwgYSB2YXJpYWJsZSBpcyBhbiB1bmtub3duIHF1YW50aXR5LiAKICAtIEluIGRhdGEsIGEgdmFyaWFibGUgaXMga25vd247IGl0IHJlcHJlc2VudHMgYSBmZWF0dXJlIHRoYXQgaGFzIGJlZW4gbWVhc3VyZWQgb3Igb2JzZXJ2ZWQuIOKAnFZhcmlhYmxl4oCdIHJlZmVycyB0byBhIHNwZWNpZmljIHF1YW50aXR5IG9yIHF1YWxpdHkgdGhhdCBjYW4gdmFyeSBmcm9tIG9uZSBjYXNlIHRvIGFub3RoZXIuCiAgCi0gVHlwZXMgb2YgdmFyaWFibGVzCiAgLSBxdWFudGl0YXRpdmUgOiBhIG51bWJlcgogIC0gY2F0ZWdvcmljYWwgKFIgY2FsbHMgdGhlc2UgZmFjdG9ycyk6IHRlbGxzIHdoaWNoIGNhdGVnb3J5IG9yIGdyb3VwIGEgY2FzZSBmYWxscyBpbnRvCiAgLSBhbGwgbm9uLW51bWVyaWNhbCB2YWx1ZXMgYXJlIGNhdGVnb3JpY2FsLCBidXQgbm90IGFsbCBudW1lcmljYWwgdmFsdWVzIGFyZSBxdWFudGl0YXRpdmUKICAgIC0gZS5nLiB6aXAgY29kZSwgSVAgYWRkcmVzcywgZGF0ZXMgCiAgICAKKipDYXNlcyoqCgotIFVuaXQgb2Ygb2JzZXJ2YXRpb24gb3IgYW5hbHlzaXMgCiAgLSB0aGlzIGlzIGV4dHJlbWx5IGNvbnRleHQgc3BlY2lmaWMgCgoKIyMgV2hhdCBpcyBUaWR5IERhdGEgCgotIEJlaW5nIG5lYXQgaXMgKipub3QqKiB3aGF0IG1ha2VzIGRhdGEgdGlkeSEKCgpUaGVyZSBhcmUgdGhyZWUgaW50ZXJyZWxhdGVkIHJ1bGVzIHdoaWNoIG1ha2UgYSBkYXRhc2V0IHRpZHk6CgoxLiBFYWNoIHZhcmlhYmxlIG11c3QgaGF2ZSBpdHMgb3duIGNvbHVtbi4KMi4gRWFjaCBvYnNlcnZhdGlvbi9jYXNlIG11c3QgaGF2ZSBpdHMgb3duIHJvdy4KMy4gRWFjaCB2YWx1ZSBtdXN0IGhhdmUgaXRzIG93biBjZWxsLgoKSXQgaXMgeW91ciBqb2IgYXMgdGhlIHJlc2VhcmNoZXIgdG8gZGVmaW5lIHRoZSB2YXJpYWJsZXMsIG9ic2VydmF0aW9ucywgYW5kIHZhbHVlcy4gCgotIFRoZSAidGlkeW5lc3MiIG9mIHRoZSBkYXRhIHNldCBkZXBlbmRzIG9uIHRoZSByZXNlYXJjaCBxdWVzdGlvbi4gSXQgaXMgbm90IGFuIGluaGVyZW50IHByb3BlcnR5IHRvIHRoZSBkYXRhIHNldCBpdHNlbGYuIAotIFdoZW4gZGF0YSBhcmUgaW4gdGlkeSBmb3JtLCBpdOKAmXMgb2Z0ZW4gc3RyYWlnaHRmb3J3YXJkIHRvIHRyYW5zZm9ybSB0aGUgZGF0YSBpbnRvIGFycmFuZ2VtZW50cyB0aGF0IGFyZSB1c2VmdWwgZm9yIGFuc3dlcmluZyBpbnRlcmVzdGluZyBxdWVzdGlvbnMuCgoKRXhhbXBsZSBvZiBVbnRpZHkgZGF0YSAKCiFbXShpbWFnZXMvdW50aWR5LWVnLnBuZykKCkV4YW1wbGUgb2YgVGlkeSBEYXRhCgohW10oaW1hZ2VzL3RpZHktZWcucG5nKQoKCi0gRGlzYWR2YW50YWdlcwogIC0gdGlkeSBkYXRhIGNhbiBiZSBoYXJkIGZvciBodW1hbiB0byBxdWlja2x5IGludGVycHJldCAKICAtIG9mdGVuIG5vdCB0aGUgaWRlYWwgZm9ybSBmb3IgY3JlYXRpbmcgZ3JhcGhpY3MKLSBBZHZhbnRhZ2VzIAogIC0gY2xlYXIgZGVmaW5pdGlvbnMKICAtIHRpZHkgZGF0YSBjYW4gZWFzaWx5IGJlICp3cmFuZ2xlZCogdG8gYSB1c2VmdWwgZm9ybSBmb3IgaW50ZXJwcmV0YXRpb24gYW5kIHZpc3VhbGl6YXRpb24gCgoKCiMjIFRpZHkgRGF0YSBFeGFtcGxlCgpGcm9tIGh0dHBzOi8vcjRkcy5oYWQuY28ubnovdGlkeS1kYXRhLmh0bWwgCgoKWW91IGNhbiByZXByZXNlbnQgdGhlIHNhbWUgdW5kZXJseWluZyBkYXRhIGluIG11bHRpcGxlIHdheXMuIFRoZSBleGFtcGxlIGJlbG93IHNob3dzIHRoZSBzYW1lIGRhdGEgb3JnYW5pc2VkIGluIGZvdXIgZGlmZmVyZW50IHdheXMuIEVhY2ggZGF0YXNldCBzaG93cyB0aGUgc2FtZSB2YWx1ZXMgb2YgZm91ciB2YXJpYWJsZXMgY291bnRyeSwgeWVhciwgcG9wdWxhdGlvbiwgYW5kIGNhc2VzLCBidXQgZWFjaCBkYXRhc2V0IG9yZ2FuaXNlcyB0aGUgdmFsdWVzIGluIGEgZGlmZmVyZW50IHdheS4KCldoaWNoIG9uZXMgb2YgdGhlc2UgaXMgdGlkeT8gCgojIyMjIE9wdGlvbiAxCgpgYGB7cn0KbGlicmFyeSh0aWR5dmVyc2UpCnRhYmxlMQpgYGAKCgojIyMjIE9wdGlvbiAyCgpgYGB7cn0KdGFibGUyCmBgYAoKCiMjIyMgT3B0aW9uIDMKCmBgYHtyfQp0YWJsZTMKYGBgCgoKIyMjIyBPcHRpb24gNAoKYGBge3J9CnRhYmxlNGEKdGFibGU0YgpgYGAKCgojIyBFeGFtcGxlIENvbnRpbnV0ZWQgCgpUYWJsZSAxIQoKIVtdKGltYWdlcy9SNERTLXRpZHkucG5nKQoKLSBOb3RlIHRoYXQgYWxsIHRhYmxlcyBjb250YWluIHRoZSBzYW1lIGluZm9ybWF0aW9uLCBqdXN0IHJlcHJlc2VudGVkIGRpZmZlcmVudGx5LiBUaHVzLCB3ZSBjYW4gdHJhbnNmb3JtIFRhYmxlcyAyLCAzLCA0YS80YiBpbnRvIFRhYmxlIDEsIGFuZCB2aWNlIHZlcnNhLgoKCiMjIEdhbHRvbiBEYXRhCgpJbiB0aGUgMTg4MHMsIEZyYW5jaXMgR2FsdG9uIHN0YXJ0ZWQgdG8gbWFrZSBhIG1hdGhlbWF0aWNhbCB0aGVvcnkgb2YgZXZvbHV0aW9uLiAgCgpIZXJlJ3MgcGFydCBvZiBhIHBhZ2UgZnJvbSBoaXMgbGFiIG5vdGVib29rLiAgRGlzY3VzcyB0aGUgZm9sbG93aW5nIGluIGdyb3VwczoKCi0gV2hhdCBtaWdodCBoZSBpbnZlc3RpZ2F0ZSB3aXRoIHRoZXNlIGRhdGEgKGUuZy4sICoqUmVzZWFyY2ggUXVlc3Rpb24qKik/Ci0gQXJlIHRoZXNlIGRhdGEgKip0aWR5KiogYWNjb3JkaW5nIHRvIG91ciBkZWZpbml0aW9uPwotIFdoYXQgYXJlIHRoZSAqKmNhc2VzKio/Ci0gV2hhdCBhcmUgdGhlICoqdmFyaWFibGVzKio/Ci0gSG93IG1hbnkgKipyb3dzKiogb2YgZGF0YSBzaG91bGQgdGhlIHJlc3VsdCBoYXZlPwotIEhvdyBtYW55ICoqY29sdW1ucyoqIG9mIGRhdGEgc2hvdWxkIHRoZSByZXN1bHQgaGF2ZT8gIFdoYXQgaXMgdGhlIGRhdGEgdHlwZSBvZiBlYWNoIGNvbHVtbj8KLSBXaGF0IGFyZSBzb21lIGFkZGl0aW9uYWwgdmFyaWFibGVzIChub3QgeWV0IHNob3duKSB0aGF0IG1pZ2h0IGJlIG9mIGludGVyZXN0PyAgSG93IHdvdWxkIHlvdSByZWNvbW1lbmQgc2hvd2luZyB0aGF0IGluZm9ybWF0aW9uIGluIHRoZSBkYXRhIHRhYmxlPwoKCiFbQSBwYWdlIGZyb20gRnJhbmNpcyBHYWx0b24ncyBub3RlYm9vay5dKGltYWdlcy9nYWx0b24tbm90ZWJvb2suanBnKQoKCgojIyBBY3Rpdml0eSAwMTogVGlkeSBEYXRhIAoKV29yayB0byBwdXQgdGhlc2UgdGFibGVzIGluIHRpZHkgZm9ybQoKLSBXb3JrIHdpdGggeW91ciBwYXJ0bmVyIAotIEFzIGEgdGVhbSwgeW91IHdpbGwgcHV0IHR3byBkaWZmZXJlbnQgZGF0YSBzZXRzIGludG8gInRpZHkiIGZvcm0uICAKLSAqKlNlZSBDYW52YXMgZm9yIGRldGFpbHMqKgogICAgLSBWaWV3LW9ubHkgc291cmNlIGRhdGEgaXMgcHJvdmlkZWQKICAgIC0gdXNlIGFueSBzb2Z0d2FyZSB5b3UgbGlrZQogICAgLSBtdXN0IHN1Ym1pdCBhIENTViB0byBDYW52YXMgCiAgICAtIGRvIG5vdCB1c2Ugc3BhY2VzIGluIHlvdXIgZmlsZSBuYW1lcyAKLSBUaXA6ICoqU2tldGNoIHRoaW5ncyBvdXQgdG9nZXRoZXIgb24gcGFwZXIgYmVmb3JlIHlvdSBkbyBhbnl0aGluZyBpbiB0aGUgY29tcHV0ZXIqKgoKCiMjIyMgVGFibGUgMTogKipHYWx0b24ncyBIZWlnaHQgbWVhc3VyZW1lbnRzIGRhdGEqKgoKIVtBIHBhZ2UgZnJvbSBGcmFuY2lzIEdhbHRvbidzIG5vdGVib29rLl0oaW1hZ2VzL2dhbHRvbi1ub3RlYm9vay5qcGcpCgoKIyMjIyBUYWJsZSAyOiAqKlByZXNpZGVudHMqKgoKIVtdKGltYWdlcy9wcmVzaWRlbnRzLmpwZykKCgoKIyMgQ29kZSBCb29rcyAKCiMjIyBXaGF0IGlzIGEgY29kZSBib29rPyAKCi0gQSAqKmNvZGVib29rKiogZGVzY3JpYmVzIHRoZSBjb250ZW50cywgc3RydWN0dXJlLCBhbmQgbGF5b3V0IG9mIGEgZGF0YSBjb2xsZWN0aW9uLiAKLSBBIHdlbGwtZG9jdW1lbnRlZCBjb2RlYm9vayBjb250YWlucyBpbmZvcm1hdGlvbiBpbnRlbmRlZCB0byBiZSBjb21wbGV0ZSBhbmQgc2VsZi1leHBsYW5hdG9yeSBmb3IgZWFjaCB2YXJpYWJsZSBpbiBhIGRhdGEgZmlsZQoKLSBodHRwczovL3d3dy5pY3Bzci51bWljaC5lZHUvd2ViL0lDUFNSL2Ntcy8xOTgzIAoKLSBGZWRlcmFsIEVsZWN0aW9ucyBDb21pc3Npb24gCiAgLSBodHRwczovL3d3dy5mZWMuZ292L2RhdGEvYnJvd3NlLWRhdGEvP3RhYj1idWxrLWRhdGEKICAKICAKIyMgUmVmZXJlbmNlcyAKCi0gaHR0cHM6Ly9kdGthcGxhbi5naXRodWIuaW8vRGF0YUNvbXB1dGluZ0Vib29rL2NoYXAtdGlkeS1kYXRhLmh0bWwjY2hhcDp0aWR5LWRhdGEKLSBodHRwczovL3I0ZHMuaGFkLmNvLm56L3RpZHktZGF0YS5odG1sCi0gaHR0cHM6Ly93d3cuaWNwc3IudW1pY2guZWR1L3dlYi9JQ1BTUi9jbXMvMTk4MwoKCgoK